Improve High Performance Factor for Floating Point Srt Division
نویسندگان
چکیده
The execution performances of the Sweeney, Robertson, Tocher (SRT) division algorithm depend on two parameters: the radix and the redundancy factor. In this paper, a study of the effect of these parameters on the division performances is presented. At each iteration, the SRT algorithm performs a multiplication by the quotient digit .This last can be just a simple shift, if the digit is a power of two otherwise; the SRT iteration needs a multiplier. We propose, in this work, an approach to circumvent this multiplication by decomposing the quotient digit into two or three terms multiples of 2. Then, the multiplication is carried out by simple shifts and a carry save addition. The implementation of this approach on Vertex-II field-programmable gate-array (FPGA) circuits gives best performances than the approach which uses the embedded multipliers 18 x 18 bits. The iterations delays are operands sizes independent. The reduction tree delays are at most equivalent to the delay of two Vertex-II slices. This approach was tested for the 4, 8, and 16 radixes in the two cases of minimum and maximum redundancy factors. By this study, we conclude that the use of the radix-8 with a maximum redundancy factor gives the best performances by using our approach for the double precision computation of the SRT division. INTRODUCTION The design of fast dividers is an important issue in high-speed floating point computing because division accounts for a significant fraction of the total arithmetic operations. It has been shown in [1], that although the division and related functions seem to be relatively unimportant instructions, with about 3% of all floating -point instructions count. However, in term of latency, they play a much larger role than multiplication. As the performance gap widens between addition/subtraction, multiplication, and division, floating-point algorithms and applications have been slowly rewritten to avoid the use of division [2]. It is not the relatively infrequent use of division that makes it less studied, but the complexity of its algorithms and the corresponding design challenges that make it neglected. The implementation of floating -point units on fieldprogrammable gate arrays FPGAs) has been virtually impossible until recently. Now with the rapid increase of FPGA density and speed, it is becoming possible to implement high performances floating -point division on FPGAs. In this paper, we are concerned by the floating -point computation of Sweeney, Robertson, Tocher (SRT)division which is the most common algorithm of digit-recurrence division implemented in modern CPUs like Pentium and UltraSparc-64 processors [3]. Taking its name from the initials of Sweeney, Robertson, and Tocher, who developed the algorithm approximately at the same time.
منابع مشابه
The Design and Implementation of a High-performance Floating-point Divider
The increasing computation requirements of modern computer applications have stimulated a large interest in developing extremely high-performance floating-point dividers. A variety of division algorithms are available, with SRT being utilized in many computer systems. A careful analysis of SRT divider topologies has demonstrated that a relatively simple divider designed in an aggressive circuit...
متن کاملSRT Division Architectures and Implementations
SRT dividers are common in modern floating point units. Higher division performance is achieved by retiring more quotient bits in each cycle. Previous research has shown that realistic stages are limited to radix-2 and radix-4. Higher radix dividers are therefore formed by a combination of low-radix stages. In this paper, we present an analysis of the effects of radix-2 and radix-4 SRT divider ...
متن کاملSRT Division: Architectures, Models, and Implementations
SRT dividers are common in modern floating point units. Higher division performance is achieved by retiring more quotient bits in each cycle. Previous research has shown that realistic stages are limited to radix-2 and radix-4. Higher radix dividers are therefore formed by a combination of low-radix stages. In this paper, we present an analysis of the effects of radix-2 and radix-4 SRT divider ...
متن کاملA Floating Point Divider Performing IEEE Rounding and Quotient Conversion in Parallel
Processing floating point division generally consists of SRT recurrence, quotient conversion, rounding, and normalization steps. In the rounding step, a high speed adder is required for increment operation, increasing the overall execution time. In this paper, a floating point divider performing quotient conversion and rounding in parallel is presented by analyzing the operational characteristi...
متن کاملDecimal SRT Square Root: Algorithm and Architecture
Given the popularity of decimal arithmetic, hardware implementation of decimal operations has been a hot topic of research in recent decades. Besides the four basic operations, the square root can be implemented as an instruction directly in the hardware, which improves the performance of the decimal floating-point unit in the processors. Hardware implementation of decimal square rooters is usu...
متن کامل